Visualising Outliers in Nominal Data
نویسنده
چکیده
Scatter plot is a useful method for visualising clusters and outliers in continuous data. However, this method cannot be used directly on nominal data due to a lack of natural ordering and 'distance' in nominal values. One solution to this problem is to map the multi-dimensional nominal data to a numeric space, and then draw a scatter plot of the data points based on the first two principal components of the numeric space. This paper reports a study on how such plots can be generated using three types of mapping: (a) Binary Input Mapping (BImap), (b) Attribute Value Frequency Mapping (AVFmap), and (c) BImap combined with AVFmap. Results show that the combined method draws upon the complementary strengths of BImap and AVFmap, to generate meaningful scatter plots for visualising categorical outliers and achieve the highest information gain among the methods tested.
منابع مشابه
Identification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملروشهای تعیین دادههای پرت در مطالعات پزشکی
Background: An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Outliers sometimes deal with to abnormality in obtained results from collected data and information. known outlier data by researchers, physicians and other persons that work in medical fields and sciences is important and they must control data before getting result a...
متن کاملVisualising multilevel models: the Initial Analysis of Data
This paper considers the use of the freeware package XLISP-STAT (Tierney, 1990) and the add-on package ARC (Cook and Weisberg, 1999) to explore multilevel data structures before more formal modelling. Exploratory data analysis (Chatfield, 1988) is regarded as an essential before further analysis. XLISPSTAT is highly interactive and with ARC can be used to investigate multilevel structures graph...
متن کاملImpact of Outliers in Data Envelopment Analysis
This paper will examine the relationship between "Data Envelopment Analysis" and a statistical concept ``Outlier". Data envelopment analysis (DEA) is a method for estimating the relative efficiency of decision making units (DMUs) having similar tasks in a production system by multiple inputs to produce multiple outputs. An important issue in statistics is to identify the outliers. In this pap...
متن کاملRobust Logistic and Probit Methods for Binary and Multinomial Regression.
In this paper we introduce new robust estimators for the logistic and probit regressions for binary, multinomial, nominal and ordinal data and apply these models to estimate the parameters when outliers or inluential observations are present. Maximum likelihood estimates don't behave well when outliers or inluential observations are present. One remedy is to remove inluential observations from ...
متن کامل